Efficient Methods for Dealing with Missing Data in Supervised Learning

نویسندگان

  • Volker Tresp
  • Ralph Neuneier
  • Subutai Ahmad
چکیده

Subutai Ahmad Interval Research Corporation 1801-C Page Mill R<;l. Palo Alto, CA 94304 We present efficient algorithms for dealing with the problem of missing inputs (incomplete feature vectors) during training and recall. Our approach is based on the approximation of the input data distribution using Parzen windows. For recall, we obtain closed form solutions for arbitrary feedforward networks. For training, we show how the backpropagation step for an incomplete pattern can be approximated by a weighted averaged backpropagation step. The complexity of the solutions for training and recall is independent of the number of missing features. We verify our theoretical results using one classification and one regression problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Eecient Methods for Dealing with Missing Data in Supervised Learning

We present eecient algorithms for dealing with the problem of missing inputs (incomplete feature vectors) during training and recall. Our approach is based on the approximation of the input data distribution using Parzen windows. For recall, we obtain closed form solutions for arbitrary feedforward networks. For training, we show how the backpropagation step for an incomplete pattern can be app...

متن کامل

Missing or Inapplicable: Treatment of Incomplete Continuous-valued Features in Supervised Learning

Real-world data are often riddled with data quality problems such as noise, outliers and missing values, which present significant challenges for supervised learning algorithms to effectively classify them. This paper explores the ill-effects of inapplicable features on the performance of supervised learning algorithms. In particular, we highlight the difference between missing and inapplicable...

متن کامل

Performance evaluation of different estimation methods for missing rainfall data

There are numerous methods to estimate missing values of which some are used depending on the data type and regional climatic characteristics. In this research, part of the monthly precipitation data in Sarab synoptic station, east Azerbaijan province, Iran was randomly considered missing values. In order to study the effectiveness of various methods to estimate missing data, by seven classic s...

متن کامل

Imputation of Missing Data Using Machine Learning Techniques

A serious problem in mining industrial data bases is that they are often incomplete, and a significant amount of data is missing, or erroneously entered. This paper explores the use of machine-learning based alternatives to standard statistical data completion (data imputation) methods, for dealing with missing data. We have approached the data completion problem using two well-known machine le...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994